image schema
Towards a Neurosymbolic Reasoning System Grounded in Schematic Representations
Olivier, François, Bouraoui, Zied
Despite significant progress in natural language understanding, Large Language Models (LLMs) remain error-prone when performing logical reasoning, often lacking the robust mental representations that enable human-like comprehension. We introduce a prototype neurosymbolic system, Embodied-LM, that grounds understanding and logical reasoning in schematic representations based on image schemas-recurring patterns derived from sensorimotor experience that structure human cognition. Our system operationalizes the spatial foundations of these cognitive structures using declarative spatial reasoning within Answer Set Programming. Through evaluation on logical deduction problems, we demonstrate that LLMs can be guided to interpret scenarios through embodied cognitive structures, that these structures can be formalized as executable programs, and that the resulting representations support effective logical reasoning with enhanced interpretability. While our current implementation focuses on spatial primitives, it establishes the computational foundation for incorporating more complex and dynamic representations.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > New York (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- (4 more...)
- Transportation > Passenger (0.50)
- Transportation > Ground > Road (0.50)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)
Hanging Around: Cognitive Inspired Reasoning for Reactive Robotics
Pomarlan, Mihai, De Giorgis, Stefano, Ringe, Rachel, Hedblom, Maria M., Tsiogkas, Nikolaos
Situationally-aware artificial agents operating with competence in natural environments face several challenges: spatial awareness, object affordance detection, dynamic changes and unpredictability. A critical challenge is the agent's ability to identify and monitor environmental elements pertinent to its objectives. Our research introduces a neurosymbolic modular architecture for reactive robotics. Our system combines a neural component performing object recognition over the environment and image processing techniques such as optical flow, with symbolic representation and reasoning. The reasoning system is grounded in the embodied cognition paradigm, via integrating image schematic knowledge in an ontological structure. The ontology is operatively used to create queries for the perception system, decide on actions, and infer entities' capabilities derived from perceptual data. The combination of reasoning and image processing allows the agent to focus its perception for normal operation as well as discover new concepts for parts of objects involved in particular interactions. The discovered concepts allow the robot to autonomously acquire training data and adjust its subsymbolic perception to recognize the parts, as well as making planning for more complex tasks feasible by focusing search on those relevant object parts. We demonstrate our approach in a simulated world, in which an agent learns to recognize parts of objects involved in support relations. While the agent has no concept of handle initially, by observing examples of supported objects hanging from a hook it learns to recognize the parts involved in establishing support and becomes able to plan the establishment/destruction of the support relation. This underscores the agent's capability to expand its knowledge through observation in a systematic way, and illustrates the potential of combining deep reasoning [...].
- Europe > Germany > Bremen > Bremen (0.14)
- Europe > Italy (0.04)
- North America > United States > New York (0.04)
- (5 more...)
Grounding Agent Reasoning in Image Schemas: A Neurosymbolic Approach to Embodied Cognition
Olivier, François, Bouraoui, Zied
Despite advances in embodied AI, agent reasoning systems still struggle to capture the fundamental conceptual structures that humans naturally use to understand and interact with their environment. To address this, we propose a novel framework that bridges embodied cognition theory and agent systems by leveraging a formal characterization of image schemas, which are defined as recurring patterns of sensorimotor experience that structure human cognition. By customizing LLMs to translate natural language descriptions into formal representations based on these sensorimotor patterns, we will be able to create a neurosymbolic system that grounds the agent's understanding in fundamental conceptual structures. We argue that such an approach enhances both efficiency and interpretability while enabling more intuitive human-agent interactions through shared embodied understanding.
- North America > United States > Illinois > Cook County > Chicago (0.05)
- Asia > Thailand > Bangkok > Bangkok (0.05)
- North America > United States > Michigan > Wayne County > Detroit (0.04)
- (5 more...)
Large Language Models for Virtual Human Gesture Selection
Torshizi, Parisa Ghanad, Hensel, Laura B., Shapiro, Ari, Marsella, Stacy C.
Co-speech gestures convey a wide variety of meanings and play an important role in face-to-face human interactions. These gestures significantly influence the addressee's engagement, recall, comprehension, and attitudes toward the speaker. Similarly, they impact interactions between humans and embodied virtual agents. The process of selecting and animating meaningful gestures has thus become a key focus in the design of these agents. However, automating this gesture selection process poses a significant challenge. Prior gesture generation techniques have varied from fully automated, data-driven methods, which often struggle to produce contextually meaningful gestures, to more manual approaches that require crafting specific gesture expertise and are time-consuming and lack generalizability. In this paper, we leverage the semantic capabilities of Large Language Models to develop a gesture selection approach that suggests meaningful, appropriate co-speech gestures. We first describe how information on gestures is encoded into GPT-4. Then, we conduct a study to evaluate alternative prompting approaches for their ability to select meaningful, contextually relevant gestures and to align them appropriately with the co-speech utterance. Finally, we detail and demonstrate how this approach has been implemented within a virtual agent system, automating the selection and subsequent animation of the selected gestures for enhanced human-agent interactions.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Michigan > Wayne County > Detroit (0.05)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (5 more...)
Exploring Spatial Schema Intuitions in Large Language and Vision Models
Wicke, Philipp, Wachowiak, Lennart
Despite the ubiquity of large language models (LLMs) in AI research, the question of embodiment in LLMs remains underexplored, distinguishing them from embodied systems in robotics where sensory perception directly informs physical action. Our investigation navigates the intriguing terrain of whether LLMs, despite their non-embodied nature, effectively capture implicit human intuitions about fundamental, spatial building blocks of language. We employ insights from spatial cognitive foundations developed through early sensorimotor experiences, guiding our exploration through the reproduction of three psycholinguistic experiments. Surprisingly, correlations between model outputs and human responses emerge, revealing adaptability without a tangible connection to embodied experiences. Notable distinctions include polarized language model responses and reduced correlations in vision language models. This research contributes to a nuanced understanding of the interplay between language, spatial experiences, and the computations made by large language models. More at https://cisnlp.github.io/Spatial_Schemas/
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
META4: Semantically-Aligned Generation of Metaphoric Gestures Using Self-Supervised Text and Speech Representation
Fares, Mireille, Pelachaud, Catherine, Obin, Nicolas
Image Schemas are repetitive cognitive patterns that influence the way we conceptualize and reason about various concepts present in speech. These patterns are deeply embedded within our cognitive processes and are reflected in our bodily expressions including gestures. Particularly, metaphoric gestures possess essential characteristics and semantic meanings that align with Image Schemas, to visually represent abstract concepts. The shape and form of gestures can convey abstract concepts, such as extending the forearm and hand or tracing a line with hand movements to visually represent the image schema of PATH. Previous behavior generation models have primarily focused on utilizing speech (acoustic features and text) to drive the generation model of virtual agents. They have not considered key semantic information as those carried by Image Schemas to effectively generate metaphoric gestures. To address this limitation, we introduce META4, a deep learning approach that generates metaphoric gestures from both speech and Image Schemas. Our approach has two primary goals: computing Image Schemas from input text to capture the underlying semantic and metaphorical meaning, and generating metaphoric gestures driven by speech and the computed image schemas. Our approach is the first method for generating speech driven metaphoric gestures while leveraging the potential of Image Schemas. We demonstrate the effectiveness of our approach and highlight the importance of both speech and image schemas in modeling metaphoric gestures.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.89)
Representation Learning of Image Schema
Yunus, Fajrian, Clavel, Chloé, Pelachaud, Catherine
Image schema is a recurrent pattern of reasoning where one entity is mapped into another. Image schema is similar to conceptual metaphor and is also related to metaphoric gesture. Our main goal is to generate metaphoric gestures for an Embodied Conversational Agent. We propose a technique to learn the vector representation of image schemas. As far as we are aware of, this is the first work which addresses that problem. Our technique uses Ravenet et al's algorithm which we use to compute the image schemas from the text input and also BERT and SenseBERT which we use as the base word embedding technique to calculate the final vector representation of the image schema. Our representation learning technique works by clustering: word embedding vectors which belong to the same image schema should be relatively closer to each other, and thus form a cluster. With the image schemas representable as vectors, it also becomes possible to have a notion that some image schemas are closer or more similar to each other than to the others because the distance between the vectors is a proxy of the dissimilarity between the corresponding image schemas. Therefore, after obtaining the vector representation of the image schemas, we calculate the distances between those vectors. Based on these, we create visualizations to illustrate the relative distances between the different image schemas.
- Europe > France > Île-de-France > Paris > Paris (0.04)
- North America > United States > New Jersey (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.66)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.54)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- Leisure & Entertainment (1.00)
- Education (1.00)
- Health & Medicine (0.93)
- Media (0.68)
What Stands-in for a Missing Tool? A Prototypical Grounded Knowledge-based Approach to Tool Substitution
Thosar, Madhura, Mueller, Christian A., Zug, Sebastian
It is not uncommon to find a tool needed for a certain task unavailable. However, humans tend to circumvent such hurdle by improvising the usability of a suitable existing object in the environment. For a robot who is expected to work alongside humans in the real word is bound to face such obstacles and an effective way to carry on with the task for it would be to find a substitute. Robots that, for instance, have to hammer a nail into a wall should look for a conventional tool, a hammer, or resort to an appropriate substitute in case a hammer is unavailable. A selection of an appropriate substitute requires a knowledge driven deliberation to determine its suitability. Baber in Baber (2003a) suggested that humans are aided by conceptual knowledge about objects during the deliberation process. In other terms, humans generally have an intuitive understanding of objects and as such use qualitative form of knowledge about properties of objects - thus, conceptual knowledge - obtained from a combination of visual sensations, experiences and the outcomes of manual investigation to evaluate the applicability of a substitute.
- Europe > Germany > Saxony-Anhalt > Magdeburg (0.04)
- North America > United States > New York > Monroe County > Rochester (0.04)
- Europe > Germany > Bremen > Bremen (0.04)
The Two Paths from Natural Language Processing to Artificial Intelligence – Intuition Machine
AI has accelerated in recent years, especially with deep learning, but current chatbots are an embarrassment. Their deficiency is disappointing because we want to interact with our world using natural language, and we want computers to read all of those documents out there so they can retrieve the best ones, answer our questions, and summarize what is new. To understand our language, computers need to know our world. They need to be able to answer questions like "Why does it only rain outside?" and "If a book is on a table, and you push the table, what happens?" We humans understand language in a way that is grounded in sensation and action. When someone says the word "chicken," we map that to our experience with chickens, and we can talk to each other because we have had similar experiences with chickens. This is how computers need to understand language.
- North America > United States > Illinois > Cook County > Chicago (0.05)
- Europe > Italy (0.04)
- Europe > France (0.04)